A Fast Normalized Maximum Likelihood Algorithm for Multinomial Data

نویسندگان

  • Petri Kontkanen
  • Petri Myllymäki
چکیده

Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likelihood (NML) criterion, requires computing a sum with an exponential number of terms. Furthermore, in order to apply NML in practice, one often needs to compute a whole table of these exponential sums. In our previous work, we were able to compute this table by a recursive algorithm. The purpose of this paper is to significantly improve the time complexity of this algorithm. The techniques used here are based on the discrete Fourier transform and the convolution theorem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exact Maximum Likelihood Estimation for Word Mixtures

The mixture model for generating document is a generative language model used in information retrieval. While using this model, there are situations that we need to find the maximum likelihood estimation of the density of one multinomial, given fixed mixture weight and the densities of the other multinomial. In this paper, we provide an exact solution and a quick algorithm to solve this problem...

متن کامل

Achievability of asymptotic minimax regret by horizon-dependent and horizon-independent strategies

The normalized maximum likelihood distribution achieves minimax coding (log-loss) regret given a fixed sample size, or horizon, n. It generally requires that n be known in advance. Furthermore, extracting the sequential predictions from the normalized maximum likelihood distribution is computationally infeasible for most statistical models. Several computationally feasible alternative strategie...

متن کامل

Computing the Regret Table for Multinomial Data

Stochastic complexity of a data set is defined as the shortest possible code length for the data obtainable by using some fixed set of models. This measure is of great theoretical and practical importance as a tool for tasks such as model selection or data clustering. In the case of multinomial data, computing the modern version of stochastic complexity, defined as the Normalized Maximum Likeli...

متن کامل

Parent Assignment Is Hard for the MDL, AIC, and NML Costs

Several hardness results are presented for the parent assignment problem: Given m observations of n attributes x1, . . . , xn, find the best parents for xn, that is, a subset of the preceding attributes so as to minimize a fixed cost function. This attribute or feature selection task plays an important role, e.g., in structure learning in Bayesian networks, yet little is known about its computa...

متن کامل

Bearing Fault Detection Based on Maximum Likelihood Estimation and Optimized ANN Using the Bees Algorithm

Rotating machinery is the most common machinery in industry. The root of the faults in rotating machinery is often faulty rolling element bearings. This paper presents a technique using optimized artificial neural network by the Bees Algorithm for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (maximum likelihood estima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005